A heuristic for learning decision trees and pruning them into classification rules

نویسندگان

  • José Ranilla
  • Oscar Luaces
  • Antonio Bahamonde
چکیده

Let us consider a set of training examples described by continuous or symbolic attributes with categorical classes. In this paper we present a measure of the potential quality of a region of the attribute space to be represented as a rule condition to classify unseen cases. The aim is to take into account the distribution of the classes of the examples. The resulting measure, called impurity level, is inspired by a similar measure used in the instance-based algorithm IB3 for selecting suitable paradigmatic exemplars that will classify, in a nearestneighbor context, future cases. The features of the impurity level are illustrated using a version of Quinlan’s well-known C4.5 where the information-based heuristics are replaced by our measure. The experiments carried out to test the proposals indicate a very high accuracy reached with sets of classification rules as small as those found by Ripper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Trade-Off Between Depth and Impurity for Pruning Decision Trees

Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some sub-trees which do not lessen the error rate can be relevant to point out some populations of specific interest or to give a representation of a large data file. We propose here a new pruning method (called pruning) which takes into account the complexity of sub-trees and which is able to ke...

متن کامل

J-measure Based Hybrid Pruning for Complexity Reduction in Classification Rules

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other...

متن کامل

Jmax-pruning: A facility for the information theoretic pruning of modular classification rules

The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in ...

متن کامل

Cost-sensitive Decision Trees with Post-pruning and Competition for Numeric Data

Decision tree is an effective classification approach in data mining and machine learning. In some applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3, such as CS-ID3, IDX, ICET and λ-ID3, have been proposed to deal with the issue. In this paper, we develop a decision tree algorit...

متن کامل

A quality index for decision tree pruning

Decision tree is a divide and conquer classification method used in machine learning. Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some sub-trees which do not decrease the error rate can be relevant to point out some populations of specific interest or to give a representation of a large data file. We present here a new pruning method (call...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • AI Commun.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2003